A Foreground Inference Network for Video Surveillance Using Multi-View Receptive Field

نویسنده

  • Akilan Thangarajah
چکیده

Foreground (FG) pixel labeling plays a vital role in video surveillance. Recent engineering solutions have attempted to exploit the efficacy of deep learning (DL) models initially targeted for image classification to deal with FG pixel labeling. One major drawback of such strategy is the lacking delineation of visual objects when training samples are limited. To grapple with this issue, we introduce a multi-view receptive field fully convolutional neural network (MV-FCN) that harness recent seminal ideas, such as, fully convolutional structure, inception modules, and residual networking. Therefrom, we implement a system in an encoder-decoder fashion that subsumes a core and two complementary feature flow paths. The model exploits inception modules at early and late stages with three different sizes of receptive fields to capture invariance at various scales. The features learned in the encoding phase are fused with appropriate feature maps in the decoding phase through residual connections for achieving enhanced spatial representation. These multi-view receptive fields and residual feature connections are expected to yield highly generalized features for an accurate pixel-wise FG region identification. It is, then, trained with database specific exemplary segmentations to predict desired FG objects. The comparative experimental results on eleven benchmark datasets validate that the proposed model achieves very competitive performance with the priorand state-of-the-art algorithms. We also report that how well a transfer learning approach can be useful to enhance the performance of our proposed MV-FCN. keywordsDeep learning, Foreground-background clustering, Video-surveillance

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pedestrians Tracking in a Camera Network

With the increase of the number of cameras installed across a video surveillance network, the ability of security staffs to attentively scan all the video feeds actually decreases. Therefore, the need for an intelligent system that operates as a tracking system is vital for security personnel to do their jobs well. Tracking people as they move through a camera network with non-overlapping field...

متن کامل

Wide-Baseline Multi-View Video Segmentation For 3D Reconstruction

Obtaining a foreground silhouette across multiple views is one of the fundamental steps in 3D reconstruction. In this paper we present a novel video segmentation approach, to obtain a foreground silhouette, for scenes captured by a wide-baseline camera rig given a sparse manual interaction in a single view. The algorithm is based on trimap propagation, a framework used in video matting. Bayesia...

متن کامل

Pedestrians Tracking in a Camera Network

With the increase of the number of cameras installed across a video surveillance network, the ability of security staffs to attentively scan all the video feeds actually decreases. Therefore, the need for an intelligent system that operates as a tracking system is vital for security personnel to do their jobs well. Tracking people as they move through a camera network with non-overlapping field...

متن کامل

Receptive Field Encoding Model for Dynamic Natural Vision

Introduction: Encoding models are used to predict human brain activity in response to sensory stimuli. The purpose of these models is to explain how sensory information represent in the brain. Convolutional neural networks trained by images are capable of encoding magnetic resonance imaging data of humans viewing natural images. Considering the hemodynamic response function, these networks are ...

متن کامل

Multiple View Video Streaming and 3d Scene Reconstruction for Traffic Surveillance

We present a 3D modeling system for traffic surveillance applications. The application contains several cameras positioned around a traffic scene. The video signals are compressed and streamed to a central client. At the central client a scene reconstruction is carried out, using all 2D camera views utilizing 3D camera calibration information. To combine all views into a 3D model, moving object...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1801.06593  شماره 

صفحات  -

تاریخ انتشار 2018